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Abstract 

Businesses ( retailers ) often wish to offer personalized advertisements ( coupons ) to individuals ( consumers ), but 
run the risk of strong reactions from consumers who want a customized shopping experience but feel their privacy 
has been violated. Existing models for privacy such as differential privacy or information theory try to quantify 
privacy risk but do not capture the subjective experience and heterogeneous expression of privacy-sensitivity. We 
propose a Markov decision process (MDP) model to capture (i) different consumer privacy sensitivities via a time- 
varying state; (ii) different coupon types (action set) for the retailer; and (iii) the action-and-state-dependent cost for 
perceived privacy violations. For the simple case with two states (“Normal” and “Alerted”), two coupons (targeted 
and untargeted) model, and consumer behavior statistics known to the retailer, we show that a stationary threshold- 
based policy is the optimal coupon-offering strategy for a retailer that wishes to minimize its expected discounted 
cost. The threshold is a function of all model parameters; the retailer offers a targeted coupon if their belief that the 
consumer is in the “Alerted” state is below the threshold. We extend this two-state model to consumers with multiple 
privacy-sensitivity states as well as coupon-dependent state transition probabilities. Furthermore, we study the case 
with imperfect (noisy) cost feedback from consumers and uncertain initial belief state. 

Keywords-Privacy, Markov decision processes, retailer-consumer interaction, optimal policies. 


I. Introduction 

Programs such as retailer “loyalty cards” allow companies to automatically track a customer’s financial transac¬ 
tions, purchasing behavior, and preferences. They can then use this information to offer customized incentives, such 
as discounts on related goods. Consumers may benefit from retailer’s knowledge by using more of these targeted 
discounts or coupons while shopping. However, in some cases the coupon offer implies that the retailer has learned 
something sensitive or private about the consumer. For example, a retailer could infer a consumer’s pregnancy | l]. 
Such violations may make consumers skittish about purchasing from such retailers. 

However, modeling the privacy-sensitivity of a consumer is not always straightforward: widely-studied models 
for quantifying privacy risk using differential privacy or information theory do not capture the subjective experience 
and heterogeneous expression of consumer privacy. The goal of this paper is to introduce a framework to model the 
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consumer-retailer interaction problem and better understand how retailers can develop coupon-offering policies that 
balances their revenue objectives while being sensitive to consumer privacy concerns. The main challenge for the 
retailer is that the consumer’s responses to coupons are not known a priori ; furthermore, consumers do not “add 
noise” to their purchasing behavior as a mechanism to stay private. Rather, the offer of a coupon may provoke a 
reaction from the consumer, ranging from “unaffected” to “ambiguous” or “partially concerned” to “creeped out.” 
This reaction is mediated by the consumer’s sensitivity level to privacy violations, and it is these levels that we 
seek to model via a Markov decision process. These privacy-sensitivity states of the consumers are often revealed 
to the retailer through their purchasing patterns. In the simplest case, they may accept or reject a targeted coupon. 
We capture these aspects in our model and summarize our main contributions below. 

A. Main Contributions 

We propose a partially-observed Markov decision process (POMDP) model for this problem in which the 
consumer’s state encodes their privacy sensitivity, and the retailer can offer different levels of privacy-violating 
coupons. The simplest instance of our model is one with two states for the consumer, denoted as “Normal” and 
“Alerted,” and two types of coupons: untargeted low privacy (LP) or targeted high privacy (HP). At each time, the 
retailer may offer a coupon and the consumer transitions from one state to another according to a Markov chain that 
is independent of the offered coupon. The retailer suffers a cost that depends both on the type of coupon offered 
and the state of the consumer. The costs reflect the advantage of offering targeted HP coupons relative to untargeted 
LP ones while simultaneously capturing the risk of doing so when the consumer is already “Alerted”. 

Under the assumption that the retailer (via surveys or prior knowledge) knows the statistics of the consumer 
Markov process, i.e., the likelihoods of becoming “Alerted" and staying “Alerted”, and a belief about the initial 
consumer state, we study the problem of determining the optimal coupon-offering policy that the retailer should 
adopt to minimize the long-term discounted costs of offering coupons. We extend the simple model above to multiple 
states and coupon-dependent transitions. We model the latter via two Markov processes for the consumer, one for 
each type (HP or LP) of coupon such that a persnickety consumer who is easily “Alerted” will be more likely to 
do so when offered an HP (relative to LP) coupon. Furthermore, for noisy costs, we propose a heuristic method 
to compute the decision policy. Moreover, if the initial belief state is unknown to the retailer, we use a Bayesian 
model to estimate the belief state. Our main results can be summarized as follows: 

1) There exists an optimal, stationary, threshold-based policy for offering coupons such that a HP coupon is 
offered only if the belief of being in the “Alerted” state at each interaction time is below a certain threshold; 
this threshold is a function of all the model parameters. This structural result holds for multiple states and 
coupon-dependent transitions. 

2) The threshold for offering a targeted HP coupon increases in the following cases: 

a) once “Alerted,” the consumer remains so for a while - the retailer is more willing to take risks since 
the the consumer takes a while to transition to “Normal”; 

b) the consumer is very unlikely to get “Alerted”; 
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c) the cost of offering an untargeted LP coupon is high and close to the cost of offering a targeted HP 
coupon to an “Alerted” consumer; and 

d) when the retailer does not discount the future heavily, i.e., the retailer stands to benefit by offering HP 
coupons for a larger set of beliefs about the consumer’s state. 

3) For the coupon-dependent Markov model for the consumer, the threshold is smaller than for the non-coupon 
dependent case which encapsulates the fact that highly sensitive consumers will force the retailers to behave 
more conservatively. 

4) By adopting a heuristic threshold policy computed by the mean value of costs, the retailer can minimize 
the discounted cost effectively even if costs are noisy. Moreover, the Bayesian approach helps the retailer to 
estimate the consumer state when the initial belief state is unknown. 

Our results use many fundamental tools and techniques from the theory of MDPs through appropriate and meaningful 
problem modeling. We briefly review the related literature in consumer privacy studies as well as MDPs. 

B. Related Work 

Several economic studies have examined consumer’s attitudes towards privacy via surveys and data analysis 
including studies on the benefits and costs of using private data (e.g., Aquisti and Grossklags in ||2j). On the other 
hand, formal methods such as differential privacy are finding use in modeling the value of private data for market 
design 0 and for the problem of partitioning goods with private valuation function amongst the agents Q- In 
these models the goal is to elicit private information from individuals. Venkitasubramaniam |5) recently used an 
MDP model to study data sharing in control systems with time-varying state. He minimizes the weighted sum of 
the utility (benefit) that the system achieves by sharing data (e.g., with a data collector) and the resulting privacy 
leakage, quantified using the information-theoretic equivocation function. In our work we do not quantify privacy 
loss directly; instead we model privacy-sensitivity and resulting user behavior via MDPs to determine interaction 
policies that can benefit both consumers and retailers. To the best of our knowledge, a formal model for consumer- 
retailer interactions and the related privacy issues has not been studied before; in particular, our work focuses on 
explicitly considering the consequence to the retailer of the consumers’ awareness of privacy violations. 

Markov decision processes (MDPs) have been widely used for decades across many fields ®. 0; in particular, 
our model is related to problems in control with communication constraints ®,0 where state estimation has a cost. 
Our costs are action and state dependent and we consider a different optimization problem. Classical target-search 
problems flO) also have optimal policies that are thresholds, but in our model the retailer goal is not to estimate 
the consumer state but to minimize cost. The model we use is most similar to Ross’s model of product quality 
control with deterioration which was more recently used by Laourine and Tong to study the Gilbert-Elliot 
channel in wireless communications d, in which the channel has two states and the transmitter has two actions 
(transmit or not). We cannot apply their results directly due to our different cost structure, but use ideas from their 
proofs. Furthermore, we go beyond these works to study privacy-utility tradeoffs in consumer-retailer interactions 
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with more than two states and action-dependent transition probabilities. We apply more general MDP analysis tools 
to address our formal behavioral model for privacy-sensitive consumers. 


While the MDP model used in this paper is simple, its application to the problem of revenue maximization with 
privacy-sensitive consumers is novel. We show that the optimal stationary policy exists and it is a threshold on the 
probability of the consumer being alerted. We extend the model to cases of consumers with multiple states and 
consumers with coupon-dependent transition probabilities. Our basic model assumes the probability of the consumer 
being alerted can be inferred from the received costs. When the costs are stochastic, we use a Bayesian estimator to 
track this probability and propose a heuristic coupon offering policy for this setting. In the conclusion we describe 
several other interesting avenues for future work. 

The paper is organized as follows: Section [TI] introduces the system model and its extensions. The main result for 
known consumer statistics are presented in Section III Section IV and [V] discuss optimal stationary policy results 
for consumers with coupon dependent response and noisy costs with unknown initial belief, respectively. Finally, 


some concluding remarks and future work are provided in Section VI 


II. System Model 

We model interactions between a retailer and a consumer via a discrete-time system (Figure |T]>. At each time t, 
the consumer has a discrete-valued state and the retailer may offer one of two coupons: high privacy risk (HP) or 
low privacy risk (LP). The consumer responds to the personalized coupon by imposing a cost on the retailer that 
depends on the coupon offered and its own state. For example, a consumer who is “alerted” (privacy-aware) may 
respond to an HP coupon by imposing a high cost to the retialer, such as reducing purchases at the retailer. The 
retailer’s goal is to decide which type of coupon to offer at each time t to minimize its cost. 


A. Consumer with Two States and Coupon Independent Transitions. 

1) Consumer Model: 

Modelling Assumption 1: (Consumer’s state) We model the consumer’s response to coupons by assuming them 
to be in one of several states. Each state corresponds to a type of consumer behavior in terms of purchasing 
(Privacy sensitivity). 

For this paper, we first focus on the two-state case; the consumer may be Normal or Alerted. Later we will extend 
this model to multiple consumer states, consumer with coupon dependent response, and unknown initial consumer 
state cases. The consumer state at timet t is denoted by G t € {Normal, Alerted}. If a consumer is in Normal state, 
the consumer is less sensitive to coupons from the retailer in terms of privacy. However, in the Alerted state, the 
consumer is likely to be more sensitive to coupons offered by the retailer, since it is more cautious about revealing 
information to the retailer. The evolution of the consumer state is modeled as a infinite-horizon discrete time Markov 
chain (Figure [T}. The consumer starts out in a random initial state unknown to the retailer and the transition of the 
consumer state is independent of the action of the retailer. A belief state is a probability distribution over possible 
states in which the consumer could be. The belief of the consumer being in Alerted state at time t is denoted by 


4 





p t . We define Xn,a = Pr[Gt = Alerted|G t _i = Normal] to be the transition probability from Normal state to 
Alerted state and Xa,a = Pr[Gt = Alerted|Gt_i = Alerted] to be the probability of staying in Alerted state when 
the previous state is also Alerted. The transition matrix A of the Markov chain can be written as 

a= A-a„ a„,A (l) 

yl - A A,A A A,A ) 

We assume the transition probabilities are known to the retailer; this may come from statistical analysis such as a 
survey of consumer attitudes. The one step transition function, defined by 

T(pt) = (1 - Pt)X N ,A + Pt^A,A, (2) 


represents the belief that the consumer is in Alerted state at time t + 1 given p t , the Alerted state belief at time t. 

Modelling Assumption 2: (State transitions) Consumers have an inertia in that they tend to stay in the same 
state. Moreover, once consumers feel their privacy is violated, it will take some time for them to come back to 
Normal state. 

The above assumption implies A a,a > 1 — A a,a, 1 — A n,a > A n,a, and A n,a > 1 — A a,A- Thus, by combining 
the above three inequalities, we have A a,a > A n^a- 

2) Retailer Model: At each time t, the retailer can take an action by offering a coupon to the consumer. We 
define the action at time t, to be u t £ {HP, LP}, where HP denotes offering a high privacy risk coupon (e.g. a 
targeted coupon) and LP denotes offering a low privacy risk coupon (e.g. a generic coupon). The retailer’s utility is 
modeled by a cost (negative revenue) which depends on the consumer’s state and the type of coupon being offered. 
If the retailer offers an LP coupon, it suffers a cost Cl independent of the consumer’s state: offering LP coupons 
does not reveal anything about the state. However, if the retailer offers an HP coupon, then the cost is C/y.v or 
Cha depending on whether the consumer’s state is Normal or Alerted. Offering an HP (high privacy risk, targeted) 
coupon to a Normal consumer should incur a low cost (high reward), but offering an HP coupon to an Alerted 
consumer should incur a high cost (low reward) since an Alerted consumer is privacy-sensitive. Thus, we assume 
Chn < Cl < Cha- 

Under these conditions, the retailer’s objective is to choose u t at each time t to minimize the total cost inccured 
over the entire time horizon. The HP coupon reveals information about the state through the cost, but is risky if 
the consumer is alerted, creating a tension between cost minimization and acquiring state information. 

3) Minimum Cost Function: We define C(p t ,u t ) to be the expected cost acquired from an individual consumer 
at time t where p t is the probability that the consumer is in Alerted state and u t is the retailer’s action: 


G(p t ,u t ) = 


(3) 


C L if u t = LP 

(1 — pt)CHN + PtCHA if u t = HP 

Since the retailer knows the consumer state from the incurred cost only when an H P coupon is offered, the state of 
the consumer may not be directly observable to the retailer. Therefore, the problem is actually a Partially Observable 
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Figure 1: Markov state transition model for a two-state consumer. 


Markov Decision Process (POMDP) [13). 

We model the cost of violating a consumer’s privacy as a short term effect. Thus, we adopt a discounted cost 
model with discount factor /3 € (0,1). At each time t, the retailer has to choose which action ut to take in order 
to minimize the expected discounted cost over infinite horizon. A policy n for the retailer is a rule that selects a 
coupon to offer at each time. Thus, given that the belief of the consumer being in Alerted state at time t is p t and 
the policy is n, the infinite-horizon discounted cost starting from t is 


V^ipt) - E: 


^ ~2P l C(pi,Ui)\p t , 


(4) 


_ i—t 

where indicates the expectation over the policy n. The objective of the retailer is equivalent to minimizing the 
discounted cost over all possible policies. Thus, we define the minimum cost function starting from time t over all 
policies to be 


Vo(p t ) = min Vg' t {p t ) for all p t e [0,1]. 

^ 7 T r' 


(5) 


We define pt+\ to be the belief of the consumer being in Alerted state at time t + 1. The minimum cost function 


Vg(pt) satisfies the Bellman equation [13 


Ku t 


Vtipt) = min 

p u t e{HP,LP} 


C Pt ) = /3* C(pt,u t ) + 


{VIM}, 

v 0 +1 (pt+i\pt,u t ). 


( 6 ) 

(7) 


An optimal policy is stationary if it is a deterministic function of states, i.e., the optimal action at a particular 
state is the optimal action in this state at all times. We define V = {[0,1]} to be the belief space and U = {LP, HP} 
to be the action space. In the context of our model, the optimal stationary policy is a deterministic function mapping 
V into U. Since the problem is an infinite-horizon, finite state and finite action MDP with discounted cost, by 0- 
there exists an optimal stationary policy it* such that starting from time t, 

V t g(p t ) = vf’\p t ). ( 8 ) 


Thus, only the optimal stationary policy is considered because it is tractable and achieves the same minimum cost 
as any optimal non-stationary policy. 


6 












By ([6]) and (J7J, the minimum cost function evolves as follows. If an HP coupon is offered at time t, the retailer 
can perfectly infer the consumer state based on the incurred cost. Therefore, 

V^hp (Pt) = FCfa, HP) + (1 -p t )V* + \\ N>A ) + Pt V t +\ A a ,a). (9) 

If an LP coupon is offered at time t, the retailer cannot infer the consumer state from the cost since both Normal 
and Alerted consumer impose the same cost Cl- Hence, the discounted cost function can be written as 

VIM = pCipt, LP) + V*+\p t+1 ) 

= P t C L + V t + 1 {T{p t )). (10) 

Correspondingly, the minimum cost function is given by 

Vp(pt) = min{^ iL p(pi),yj >HP (pi)}. (11) 

We now describe some simple extensions of this basic model. 

B. Consumer with Multi-Level Alerted States 

In this section, the case that the consumer has multiple Alerted states is studied. Without loss of generality, we 
define Gt £ {Normal, Alertedi,... Alerted#} to be the consumer state at time t. If the consumers is in Alerted*, 
state, it is even more cautious about coupons than in Alerted*,_i state. Beliefs of the consumer being in Normal, 
Alertedi, • • •, Alerted# state at time t are defined by p t = (pN,t,PA 1} t, ■ ■ ■ ,PA K ,t) T - At each time t, the retailer 
can offer either an HP or an LP coupon. Costs of the retailer when an HP coupon is offered while the state of the 
consumer is Normal, Alertedi,..., Alerted# are defined by C = (Chn,Cha 1 , ■ ■ ■ ,Cha k ) T ■ If an LP coupon is 
offered, no matter in which state, the retailer gets a cost of Cl- We assume that Cha k > • • • > Cra , > Cl > Crn- 
The minimum cost function evolves as follows: 

Vp(pt) = min{yj LP (p t ), HP (p t )}, (12) 

where Vg LP (pt) = /3* Cl + V^ +1 (p*+i) and V# H p(Pf) = /3*pfC + Vj{ +1 (p t+ i) represents the cost of offering an 
LP and an HP coupon, respectively. This model can be generalized to consumer with finitely many states. 

C. Consumer with Coupon Dependent Transitions 

In the previous formulations, we assume that the consumer’s state transition is independent of the retailer’s 
action. A natural extension is the case where the action of the retailer can affect the dynamics of the consumer 
state evolution (Figure |2|. Generally, a consumer’s reactions to HP and LP coupons are different. For example, 
a consumer is likely to feel less comfortable when being offered a coupon on medication (HP) than food (LP). 
Thus, in Section |TV} we assume that the Markov transition probabilities are dependent on the coupon offered with 
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Figure 2: Coupon type dependent Markov state transition model. 


transition matrix given by Alp(Ahp), where Alp and Ahp are defined as: 


Alp = 


| 1 — ^N,A 

A n,a | 

1 , A H p = I 

< 1 AAT, A 

Ajv,a | 

1 

T—1 

Aa,a J 

1 i 

1 

T—1 

A'a J 


(13) 


Thus, the minimum cost function is given by ( fTT| , where Vg L p(Pt) = frCiPt, LP) + Vg +1 (T(p t )) and Vg HP (pt) = 
/3 t C(pt , HP) + (1 — pt)Vg +[ (A' y a ) + ptVg +1 (\' A A ) denotes the cost function of using an LP coupon and an HP 
coupon, respectively. T(p t ) and T'(p t ) are the one step transition given by T{p t ) = Aat j a(1 — Pt) + A A,APt and 
T '(Pt) = A'^aC 1 - Pt) + X A ,APt- 


D. Policies under Noisy Cost Feedback and Uncertain Initial Belief 

Consider a setting in which the feedback regarding the cost may be noisy, e.g., the cost incurred by the consumer’s 
response to the coupon is not deterministic. For each individual consumer, the state transition is independent of 
the action of the retailer. For given state Gt and action it t , define the distribution of observing a cost G t = c to be 
f(c\G t ,u t ). In this case, the threshold policy computed using costs might not be optimal. Moreover, if the initial 
belief is unknown to the retailer, it has to estimate the consumer state before making decision. Thus, we propose 
some alternative approaches to decide which coupon to offer when those costs are random. A heuristic approach 
to deal with the randomized cost is to use the threshold r computed by the mean value of costs. Furthermore, 
the estimation of consumer belief state p t or the actual state Gt is updated by the maximum a posteriori rule 
(MAP) fl5) . After the estimation process, the retailer decides which coupon to offer based on the threshold policy 
given in Section m 


E. Summary of Main Results 

For the problems described in Subsection II-A II-B and II-C[ given all system parameters, we show the following: 

• there exists an optimal stationary solution which has a single threshold property and 

• the threshold only depends on the system parameters, i.e., transition probabilities and instantaneous cost 
associated with each type of coupon. 
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This means by adopting the optimal policy, the retailer will offer an HP coupon if p t is less than some threshold 
and offer an LP if p t is above the threshold. 


For the model described in Subsection |II-C| we assume that the cost feedbacks are noisy and consumer belief 
state is unknown to the retailer. For this model: 

• we design a heuristic threshold policy when the received costs are noisy. 

• a Bayesian estimation approach is proposed to estimate the actual state or the belief state of the consumer 
when the initial state is unknown to the retailer. 


III. Optimal Policies with Known Consumer Statistics 
In this section, we consider the basic formulation as well as the first three extensions. First, we assume that there 
are only one retailer and one consumer in the system and the state transition of the consumer is independent of the 
coupon offered. The evolution of the minimum cost function is given in (|9j, ( fTO} , and GD- 


A. Properties of Minimum Cost Function 

Lemma 1: Assume V!f" to be the minimum cost when the decision horizon starts from t and only spans m 
stages, given a time invariant action set m = {LP, HP}, for any i = 0,1,. .. , Vg’ m (p) = /3V^~ 1 ' m (p). 

Proof: By (|5]> and Ui £ {LP, HP} for any i = 0,1,.... 

1 

53 P lc (Pi’ u i)\Pt =p 

i—t 

~t+m —2 ' . 

53 p lc (Pi,ui)\p t -i =p 14 ' 

_ i=t— 1 
= PVp- x ' m {p). 

By using induction on t, we can easily prove Vg’ m (p) = /3Vp~ 1,m (p) = ■ ■ ■ = P t Vg' m (p). ■ 

Lemma 2: The minimum cost function Vj}(p) is a concave and non-decreasing function of p. 

Proof: We prove these properties by induction. Define Vg’ m to be the minimum cost when the decision horizon 
starts from t and only spans k stages. For k = 1, 

Vp k (p) = min{CL, (1 - p)C H n + pCha}, (15) 

which is a concave function of p. For k = n— 1, assume that Vt' k (p) is a concave function. Then, for k = n, since 
Vp ,n l {p) is concave and V^’lp (p) — /3*C’l +Vg +1 ' ,l_1 (T(p)), by the definition of concavity and LemmajlJ we can 
conclude that V^'\_ p (p) is concave. Also, V| hp(p) I s an affi ne function of p, thus Vp k (p) = min{V|’Lp(p), ^’hpO 5 )} 
is a concave function of p. Taking k —> oo, Vp k (p) -A Vp(p), which implies Vg(p) is a concave function. 

Next, we prove the non-decreasing property of the minimum cost function. For k = 1, as shown in equation 
it is a non-decreasing function of p. Assume that Vt k (p) is a non-decreasing function for k = n — 1. For 


Vg’ m {p ) = min 

^ 7T 

= (3 minEjr 
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k = n. Let pi > p 2 . 


^,LP(P1) - ^,LP(P2) 

(16) 

= f){Vl' n -\T{ Vl )) -V£-\T(p 2 ))) 

= P{Vp’ n ~ l {{\ a ,a - A n ,a)pi + Atv.a) 

(17) 

— Vg n 1 ((Aa,a — Xn,a)P2 + Ajv,a))) 

(18) 

> 0 . 

(19) 


By using the same technique, we can prove that given p 2 — Pi < 0 ,Chn — Cha < 0 and Vg k 1 (Xn,a) ~ 

Vl' k ~\\ A ,A) < 0, 

^,hp(Pi)-^;Sp(P2)>0. (20) 

Since Vg k (pt) = min{V^L P (p), Vp^ p {jp)}, it is the minimum of two non-decreasing functions. Therefore, Vp k (p) 
is non-decreasing. By taking k -A oo,Vg k (p) -A Vp(p). Thus, Vp(jp) is a non-decreasing function. ■ 

Lemma 3: Let $hp to be the set of values of pt for which offering an HP coupon is the optimal action at time 
t. Then, <1>hp is a convex set. 

Proof: Since $ H p = {?£ [0,1], Vg(p) = Vp HP (p)}, assume that p t = ap t p + (1 - a)p t , 2 in which p t ,\,Pt,i £ 


$hp and a £ [0,1], Vg(pt) can be written as: 

Vp(Pt) = Vp(ap t ,i +(1-a)p t , 2 ) (21) 

> aVfa t>1 ) + (1 - a)V$(p t , 2 ) (22) 

= a ^8,Hp(Pt,l) + (1 “ a Wp,Hp{Pt,2) (23) 

= a[(l -p t p)\^C HN +0V${\ N , A )] +PtA[P t C H A + PV$(\ a ,a)]] 

+ (1 — a) [(1 — Pt., 2 ) [^Chn + @Vp (Ajv,a)] + Pt, 2 {ft Cha + PVp (Aa,a)]] (24) 

= Vp t Hp(apt,i + 0--a)p ti2 ). (25) 

Thus, we have shown that: 


Vp(p t ) > V% iHP (ap t>1 + (1 - a)p t>1 ) = Vg iHP (p t ). 

By the definition of Vg(p t ) in ([□}. Vg{p t ) < V^ HP {p t ). Therefore, V^ HP {p t ) = 
convex. 


(26) 

Va(p t ), which implies $hp is 
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B. Optimal Stationary Policy Structure 


Theorem 1: There exists a threshold r £ [0,1] such that the following policy is optimal: 


Tt*(pt) = 


LP if r < p t < 1 


HP if 0 < pt < t 
More precisely, assume S = Cha — Chn + /3(Vp(A a,a) — Vs(Aj v,a))> 


C i -(l-/3)(CH J v+/3V 8 (A JV ,A)) 
(1-^)5 


T(r) > t 


Cj j +/3Ajv 1 a(Cha+/3V/3(Aa,a)) (1—/3(1 —Ajv i ^))(CHiy+/3Vg(Aiv i A)) 


(1 — (Aa,a — \n,a)P)6 (1 — (Aa.a — ^n,a)0)S < T 


where for A n,a > t, 


V${\n,a) = Vp(\a,a) = C L /(1 - /3) 


and for Aat,a < t, 

Vp(\n,a) = (1 - Ajv,a)[Cjjjv + V^-(Ajv,a)] 

+ Ajv,yl[C'i/A + ^8 (Aa,a)]j 
Va(A a,a) = min{G(n)}, 

n>0 

where 


( 27 ) 


(28) 


(29) 


(30) 

(31) 


Cl ^ + p n [T n (\ AiA )(C HN + C(\ n ,a)) + T n (\ A , A )C HA ] 


G(n) = 

T n (X A ,A ) = 


1 _ ^+i[r»(A A ,a) i-£Z'£ a) p + T n {\A,A)] 

(Aa,a — Ajy,A) n+1 (1 — X a ,a) + X NtA 
1 — (Aa,a — A at, a) 

T"(Aa,a) = 1 - T n ( Aa.a) 

(1 — \n,a)Chn + A n,aCh A 


C(Xn,a) = p~ 


1 — (1 — Xn,a)(3 


(32) 

(33) 

(34) 

(35) 


The proof of Theorem [T] is provided in the Appendix [A] . An immediate consequence of this result is an upper 
bound on p t for offering an H P coupon. 

We define k to be the ratio between the gain from offering an HP coupon to a Normal consumer and the loss 
from offering an HP coupon to a consumer whom the retailer thinks is Normal but is actually Alerted. Thus, 


_ _ Cl — Chn 
Cha - Chn 

For fixed costs, the threshold can be bounded by the following two Corollaries. 


(36) 
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Corollary 1: In the model where transition probabilities (Ajv,a> A 44 ) are unknown to the retailer, if 


Pt < k, 


( 37 ) 


then it is optimal for the retailer to offer an HP coupon. 

Corollary 2: Fix the costs and A a,A, let Ai = Cha-Chn anc ^ solution of i_(x^-a 2 ) = • 

When Xn,a > A 2 , the threshold r in the optimal stationary policy can be written as a closed form expression with 
respect to Xn,a'- if Xn,a > Ai, 

t = k; (38) 


if A 2 < A n a < Ai, 


(3{Cl — Cha)Xn,a + Cl — Chn 


(1 -P)C H a-Chn+PC l 
Moreover, if A ; v 4 < A 2 , r can be upperbounded by 

A2 


r = 


1 ^ (A A,A — X 2 ) 


(39) 


(40) 



Figure 3: Discounted cost resulted by using different decision policies 

A detailed proof of Corollary [I] and [2] are presented in the Appendix [B] and Appendix [C] respectively. To illustrate 
the performance of the proposed threshold policy, we compare the discounted cost resulted from the threshold policy 
with the greedy policy which minimize the instantaneous cost at each decision epoch as well as a lazy policy which 
a retailer only offers LP coupons. We plot the discounted cost averaged over 1000 independent MDPs w.r.t. time t 
for different decision policies in Fig. [3] The illustration demonstrates that the proposed threshold policy performs 
better than the greedy policy and the lazy policy. 


Figure 4a shows the optimal threshold policy with respect to A 4 .- 4 for three fixed choices of A 4 4 . It can be 


seen that the threshold is increasing when A n,a is small, this is because for a small Athe consumers is less 
likely to transition from Normal to Alerted. Therefore, the retailer tends to offer an HP coupon to the consumer. 
When Ajv A gets larger, the consumer is more likely to transition from Normal to Alerted. Thus, the retailer tends 
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(a) Threshold r vs. Ajv a • (Parameters: (3 = (b) Threshold r vs. A^r A- (Parameters: A ^4 a — 

0.9, C L = 3,Chjv = = 12,/c = 0.18.) 0.7,0 = 0.9, CW = 1 ,’Cha = 12.) 

Figure 4: Threshold r vs. /3 for different values of A 4 4 and Ajv,a 


to play conservatively by decreasing the threshold for offering an LP coupon. When Xn.a is greater than k, the 
retailer will just use k to be the threshold for offering an HP coupon. One can also observe that with increasing 


A a,a, the threshold r decreases. On the other hand, for fixed Chn and Cha , Figure 4b shows that the threshold 
t increases as the cost of offering an LP coupon increases, making it more desirable to take a risk and offer an 
HP coupon. 



(a) Threshold r vs. /3 for different values of A a, A 
(Parameters: A n,a = 0.1, Cx = 3, Chn = 
l,C ha = 12 , k = 0.18.) 



(b) Threshold r vs. /') for different values of An,A 
(Parameters: A a a = 0-7, Cx = 3, Chn = 
l,C ha = 12 .) ’ 


Figure 5: Threshold r vs. f3 for different values of A ,1 .4 and A n,a 


The relationship between the discount factor /3 and the threshold r as functions of transition probabilities is 
shown in Figure [5] It can be seen in Figure 5a that the threshold increases as B increases. This is because when 
(3 is small, the retailer values the present rewards more than future rewards. Therefore, the retailer tends to play 
conservatively so that it will not “creep out” the consumer in the present. Figure [5b] shows that the threshold is 
high when A a,a is large or A n,a is small. A high A 4 ,4 value indicates that a consumer is more likely to remain in 
Alerted state. The retailer is willing to play aggressively since once the consumer is in alerted state, it can take a 
very long time to transition back to Normal state. A low Xn,a value implies that the consumer is not very privacy 
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Figure 6 : Threshold r vs. (3 for different values of Cl- (Parameters: A n,a = 0.1, Aa,a = 0.9, Chn = 1, Cha = 12.) 


sensitive. Thus, the retailer tends to offer HP coupons to reduce cost. One can also observe in Figure [5b] that the 
threshold r equals to k after A n,a exceeds the ratio k. This is consistent with results shown in Figure [4] 

The effect of an LP coupon cost on the threshold for different discount factors is plotted in Figure [6] It can be 
seen that a higher Cl will increase the threshold because the retailer is more likely to offer an HP coupon when 
the cost of offering an LP coupon is high. 


C. Consumer with Multi-Level Alerted States 

In this section, we study the case that the consumer has multiple Alerted states. Without loss of generality, we 
define the transition matrix to be 


A = 


^ A N,N Aat jJ 4 1 . . . A n,Ak ^ 

Aai,JV Aai.Ai Aai.A* 


(41) 


\^a k ,n Aa^Ai • ■ ■ A a k ,a k ) 

and e, to be the i th row of A. The expected cost at time t, given belief p t and action u t , has the following 
expression: 


C{pt,u t ) = 


C L if u t = LP 

pfC if u t = HP 


(42) 


Assuming that the retailer has perfect information about the belief states, the cost function evolves as follows. 
By using an LP coupon at time t, 


VW(Pt) = PCl + ^ +i (pr+i) = FCl + Vl + \T{ Pt)), 


(43) 
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1 1 


Figure 7: Example of the optimal policy region for three-state consumer. (Parameters: An.n = A n .41 = 
0.2, An,a2 = 0.1;Aai,n = 0.2, Aai.ai = 0.5, Aai,a2 = 0.3;Aa2,n = 0.1,Aa2,ai = 0.2,Aa2,a2 = 0.7;/3 = 0.9, Cl = 
7, Chn = lj Chai = 10, Cha2 = 20). 


where T(p t ) - p/ A is the Markov transition operator generalizing ([2]). By using an HP coupon at time t. 


' ^ t+1 (ei) \ 

v; + \*2) 


Va,Hp(P t) = /3 t pfC + V fl ‘ +1 (p t+ i) =ptfC + pT 


(44) 


\ Vp +1 i e K+ 1 ) / 


Therefore, by |TT}, we have Ej(p t ) = min{I/^ LP (p t ), V^ HP (p t )}. 

In this problem, since the instantaneous costs are nondecreasing with the state when the action is fixed and the 
evolution of belief state is the same for both LP and HP, the existence of an optimal stationary policy with threshold 
property is guaranteed by Proposition 2 in (T§. The optimal stationary policy for a three-state consumer model is 


illustrated in Figure [7] For fixed costs, the plot shows the partition of the belief space based on the optimal actions 


and reveals that offering an HP coupon is optimal when p^ t , the belief of the consumer being in Normal state, is 
high. 


IV. Consumers with Coupon Dependent Transitions 


Generally, consumers’ reaction to HP and LP coupons are different. To be more specific, a consumer is likely to 
feel less comfortable when being offered a coupon on medication (HP) than food (LP). Thus, we assume that the 
Markov transition probabilities are dependent on the coupon offered. Let p t denote the belief of a consumer being 
in the Alerted state at time t. 

As shown in Figure [2] by offering an LP coupon, the state transition follows the Markov chain 



( 45 ) 


15 






COUPON DEPENDENT TRANSITIONS 


NO COUPON DEPENDENT TRANSITIONS 



HA 


10 c 


0 


L 


Figure 8 : Optimal policy threshold for consumer with/without coupon dependent transition probabilities. (Parameters: 

Ajv,a = 0.2, Xa,a = 0.8, \' n a = 0.5, \' AA = 0.9, /3 = 0.9). 


Otherwise, the state transition follows 



(46) 


According to the model in Section |TT] A a, A > ^n,a,^'aa > A' v 4 . Moreover, we assume that offering an HP 
coupon will increase the probability of transition to or staying at Alerted state. Therefore, \' A A > A 4.4 and 
A'v 4 > Ay. 4 - The minimum cost function evolves as follows: for an HP coupon offered at time t, we have 


Vl HP (pt) = f3 f C( Pt , HP) + (1 -p t )V* +1 (A' NiA ) +PtVl +1 {\'a, a )- 


Otherwise, 


^j.Lpfe) = FCl + v; + \p t+1 ) = PCl + Vp +1 (T(p t )), 


where T(p t ) = A y.4 (1 — p t ) + A A,APt is the one step transition defined in Section [ll| 

Theorem 2: Given action dependent transition matrices Alp and Ahp, the optimal stationary policy has threshold 
structure. 

A detailed proof of Theorem [2] is presented in the Appendix [D] 

Figure [ 8 ] shows the effect of costs on the threshold r. We can see that for a fixed 67, and Cha pub, the threshold 
for LP coupons for consumers in this model is lower than our original model without coupon-dependent transition 
probabilities. The retailer can only offer an LP coupon with certain combination of costs; we call this the LP-only 
region. One can also see that the LP-only region for the coupon-independent transition case is smaller than that for 
the coupon-dependent transition case since for the latter, the likelihood of being in an Alerted state is higher for 
the same costs. 
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V. Policies under Noisy Cost Feedback and Uncertain Initial Belief 


In this section, we study the case in which the received costs are random. In the previous sections, if the retailer 
offered an HP coupon at time t, then it could learn the state of the consumer at time t based on whether there 
received cost was Chn or Cha- If the cost feedback is random, the the retailer may not be able to infer the 
consumer’s state exactly. We describe policy heuristics for this setting that perform Bayesian estimation of the 
quantity p t used in the threshold policy earlier. This approach is also useful when the initial value po is not known 
to the retailer. 

We model the noisy cost feedback by assuming the received cost C t is random. The distribution of C t is 
given by a conditional probability density f(c\Gt,Ut) on a bounded subset of R, where G t is the state of the 
consumer and ut is the action taken by the retailer at time t. To match the previous model, we further take 
f(c\Gt = Alerted, ut = LP) = f{c\Gt = Normal, ut = LP) to indicate that the received cost conveys no information 
about the state under an LP coupon. Let f(c\u t = LP) = /(c|G t = Alerted, u t = LP). For a given value p t = p, 
define the likelihood of observing a cost Ct, = c under the two coupons: 

£(c|LP,p) = /(c|Alerted, LP) (47) 

f(c|HP,p) = /(c|Normal, HP)(1 — p) + /(c|Alerted, HP)p (48) 

These likelihoods will be useful in defining the two estimators. 

In both approaches in this section the retailer computes an estimate p t of the probability p t that G t = Alerted. 
It then uses ( [27] ) to decide which coupon to offer at time t by comparing p t to a version of the threshold in 
( |28| l. Define Cc,ChaT- and Cha to be the feasible cost sets {c : /(c|LP) > 0}, {c : /(c|Alerted, HP) > 0}, and 
{c : /(c|Normal, HP) > 0}, respectively. Since r involves the costs Cl, Chn and Cha , there are several ways to 
compute an approximate threshold under the cost uncertainty. 

Firstly, we can set Cl, Chn and Cha to be the expected costs: 

C L = f cf(c\LP)dc (49) 

J R 

Chn = f c/(c|Normal, HP)dc (50) 

Jr 

Cha= [ c/(c|Alerted, HP)dc. (51) 

Jr 

Plugging these into ( [28] > gives the mean threshold r avg . Since r is monotonically increasing in G/ and Cha and 
monotonically decreasing in Chn, we can compute and upper bound on r by setting Cl = max{c : c £ Cc }, 
Cha = max{c : c £ Cha}, and Chn = max{c : c £ Chat}- These values give the upper bound threshold 
r max . Similarly, by setting Cl and Cha to the lower bounds on the support and Chn to the upper bound, 
we obtain a lower bound threshold r m j n . Finally, we computed a robust version of threshold tr as tr = {r : 

max {min Vg(p t )}}, where (Cl, Chn, Cha) £ Cc x Chm x Cha, is the This threshold policy is the 

Cl,Chn,Cha n(pt) ^ 

largest (cost case) threshold over all possible combination of costs. Thus, it gives the max — min value of the total 
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Figure 9: Temporal discounted costs for different heuristics on computing thresholds. (Parameters: A n,a = 0.2, 
A a,A = 0.8, po = 0.2 ,/3 = 0.95, /(c|LP) = U nif [ 6 ,10], /(c|Normal, HP) = U nif [0.2, 5.8], and /(cjAlerted, HP) = 
Unif [12, 20]). The discounted cost is averaged over 1000 independent runs. 


discounted cost. We can see that the total discounted cost induced by this robust version of threshold is close to 
that induced by using the upper bound of costs. 

A. MAP Estimation of the Consumer State 

In the previous model, if u t = HP the retailer could infer G f based on C t , so pt+\ is given by the state transitions 
of the Markov chain. With noisy costs this exact inference is no longer possible. A simple heuristic for the retailer 
is to try to infer Gt based on the random cost Ct, compute an estimate of p t , and then use the previous strategy. 

At time t = 1, given an initial pg we estimate pi = T{pf). The retailer then applies the threshold policy ( |27j ) 
with input p\ to offer a coupon. For times t = 2,3,... the retailer treats the estimate p t _ 1 as an estimate of the 
probability that Gt -1 = Alerted. If Ut.-i = LP, then the retailer sets p t = T[p t -i)- If Ut-i = HP then the retailer 
uses a maximum a posteriori probability (MAP) detection rule to estimate the state G t ~ i based on the received 
cost Ct- 1 . That is, it sets Gt- 1 = Normal if 

/(Cr-tj Normal, HP)(1 — pt-i) 

/(Ct-i|Alerted, HP)p t _! 

and G t -i = Alerted otherwise, where Ct- 1 is the received cost at time t — 1. It then uses the following estimate 
p t at time t: 

{ An a if Gt = Normal 

(53) 

A a,a if Gt = Alerted 

Essentially, the retailer uses MAP estimation to infer Gt -1 after receiving the cost Ct -1 from the action Ut-i = 
HP. If the densities /(c|Normal, HP) and /(c|Alerted, HP) have disjoint supports, then the inference of G t ~i is 
error free, so Gt-i = Gt -1 and the estimate p t is correct. Figure [ 9 ] shows the discounted cost as a function of time 
for some different variants of the threshold in (|28]l. In this example the cost distributions are uniformly distributed 
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in disjoint intervals. The plot shows that the mean threshold yields a total discounted cost that is slightly less than 
the upper and lower bound thresholds. 


B. Bayesian Estimation of State Probabilities 

In the previous approach, the retailer estimates the underlying state and then uses this to form an estimate of 
the probability p t that Gt = Alerted. A different approach is to form a Bayes estimate of p t : the retailer computes 
a probability distribution on [0,1] representing its uncertainty about p t . To choose an action ut it can use a point 
estimate of p t to use in \21\ with one of the thresholds described before. 

In this formulation, the estimator of p t is a probability distribution. Let qt-i{p) be the estimator of pt-i- The 
retailer treats this as a prior distribution. Upon receiving the cost C t _i it computes a posterior estimate on pt-i 
using Bayes rule. If Ut-\ = HP, it sets 

^(C t 1 |HP,p)g t 1 (p) 


q t -i(p\C t -i) = 


J^(G t -t\HP,p')q t - 1 (p')dp' 


(54) 


If Ut ~i = LP then from ( [47] ) we can see that f(Ct_i|LP,p) does not depend on p, so the posterior q t -i{p\Ct~i) = 
qt-\{p) in this case. Given the posterior estimate q t -i(p\Ct-i) the retailer then evolves the state distribution 
through the Markov chain governing the state to form the prior distribution g t (p) for estimating p t at time t. That 
is, if P t - 1 is a random variable with distribution q t -i(p\Ct~i), then qt(p) is the distribution of T(P t - 1 ). Let 


Qt-i(p\Ct~i) = fo qt-i(p'jCt-i) be the cumulative distribution function of P t -\ . Then 


’ (T(P t _i) <p) = 


Pt -i < 


P — A n,a 

Aa.a — A« 


, P Ajv,A 

— Gt -1 1 T-:-w-i 

AA.A — AN,A 


(55) 


qtip) = 


1 


P - Aat.a 

-qt -1 l ^;-|Lt-i 


(56) 


and 


Aa,a — Aat.a \^a,a — ^n,a' 

The retailer then uses qt(p) to form a point estimate p t of p t suitable for applying the threshold policy in ( |27j ) 
. We consider two such point estimates which we call the mean and max estimators, respectively: 


Pt,mean = / pq t {p)dp 

Jo 


Pt, map = argmax q t (p ). 
pe [o,i] 


(57) 

(58) 


Figure 10 shows the discounted cost versus time for uniformly distributed costs with overlapping support. The 
decision is made by following the optimal stationary policy computed by the mean threshold in [V] We illustrate 
the result for four algorithms: the solid curve and the dash-dot curve are the MAP and mean strategy described 
above, respectively; the dashed curve is a policy in which the costs are random but the algorithm is given side 
information about Gt after choosing ut = HP (perfect state information); finally, the curve with cross is the MAP 
estimate of actual state Gt described in Section |V-A| In this example, as one can expect, decision making with 
perfect state information has the minimum discounted cost. MAP estimation of Gt results in an 0.82% increase in 
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Figure 10: Temporal discounted costs for different estimation mechanisms. (Parameters: A n,a = 0.2, A. 4,,4 = 0.8, 
Po = 0.2 ,/3 = 0.9, /(c|LP) = Unif[3, 9], /(cjNormal, HP) = Unif[0.25, 7.75], /(c|Alerted, HP) = Unif[6,18]). The 
discounted cost is averaged over 1000 independent runs. 


total discounted cost compared to the case in which the retailer receives perfect information about consumer state. 
However, the MAP and mean policy to estimate belief state p t only have 2.9% and 4.29% increase, respectively. 
Thus, the MAP for estimating belief perfoms slightly better than the Mean policy. Effectively, the lack of initial 
belief knowledge does not affect the discouted cost very much on average. This is because offering an H P coupon 
allows the retailer to learn the actual state from the cost feedback, thus, reset the belief state. 

VI. Conclusions 

We proposed a POMDP model to capture the interactions between a retailer and a privacy-sensitive consumer 
in the context of personalized shopping. The retailer seeks to minimize the expected discounted cost of violating 
the consumer’s privacy. We showed that the optimal coupon-offering policy is a stationary policy that takes the 
form of an explicit threshold that depends on the model parameters. In summary, the retailer offers an HP coupon 
when the Normal to Alerted transition probability is low or the probability of staying in Alerted state is high. 
Furthermore, the threshold optimal policy also holds for consumers whose privacy sensitivity can be captured via 
multiple alerted states as well as for the case in which consumers exhibit coupon-dependent transition. For the 
case in which the cost feedbacks from the consumer are noisy, we have introduced a heuristic method using the 
mean value of costs to compute the decision threshold. Furhtermore, under noisy cost feedbacks scenario, we have 
introduced a Bayesian data analysis approach for decision making which includes estimating consumer belief state 
when the initial belief state is unknown to the retailer. Our work suggests several interesting future directions: one 
straightfoward extension of our work is to model uncertainties in the statistical model for the consumer transition 
probabilities. Further a field, one can also develop game theoretic models to study the interaction between a retailer 
and strategic consumers and develop methods to test those models in practice. 
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Appendix A 
Proof of TheoremQ] 


Proof: Let pp be the stationary distribution of the Markov transition. Then pp = A a,aPf + (1 — Pf)Xn,a, 
which implies pp = Remember that the threshold is the solution to Vp L p (Pt) = Vp hp(p*)- Let r 

be the threshold value, we have: 


pc L + v; +i (t{t)) 

= (1 — T ) [^Chn + La +1 (A iv, a)] + t^Cha + Vp +1 {Xa,a)\- 


(59) 


By the definition of Vp(p t ), we know that Vg(p t ) = j3 t Vp{p t ). Thus Vg(A n,a) = P t Vp(XN, A ) and Vg(A a,a) = 

/3 t Vp(X A , A ). 

If T(r) > r, which is equivalent to pp > r, then V^ + 1 (T(t)) = VjfQ (T(r)). Therefore, Vp L p(r) = 
lim {/3 4 + /3"V ‘ +1 (T"(r))} where T"(r) =T(T"- 1 (r)) = p F (l - (A a , a - A^)") + (A a , a - A^r. 

n—>oo c H 

Taking n —> oo, we have Vg lp( t ) = /5* yzgj • Substitute this into ( |59| ) yields: 

^ " p = (1 — t)Crn + tCha + fi(rVp[X a, a) + (1 — T )Vp(X]\ t,a))- (60) 


By rearranging terms in the above expression, we have 

jzfj — CffN — PV@(X n,a) 


( Cha — Chn) + P(Vp(X a ,a) — Vp(X n,a)) 

If Pf < r, then T(t) < r. Therefore Vp +1 (T{r)) = V r g’j H 1 p (T(r)), which implies 

V&lp(t) = /3‘^l + V; + \T(t)) = pC L + ^+ H p(r(r)) = ^ iHP (r). 


(61) 


(62) 


In this case. 


Cl + ^V^,hp('L('t)) = Vp,Hp{ T )- 


(63) 


Substitute ([T]) and (|9]i into ( |63[ i, we have 

= C L ~{1- /?(! - X n ,a))(C H n + PV 0 (Xn,a)) 

(1 — (A a,a — X n ,a)/3){C H a ~ C H n + P{Vp(X a ,a) — V(X n ,a))) 

_ /3Xn,a(Cha + PV/3{Xa,a)) _ 

(1 ^ (A a,a — Xn,a)(3)(Cha - Chn + P{Vp(X A , A ) — V(Xn,a))) 

Next, we present how to compute Vp(Xn,a) and Vp{X A , A )- 

Case 1: If A n,a > t, then by Modeling Assumption [2j Xa,a > A n,a > t and > A at, a > t. Thus, both Aa,a 
and Xn,a are in $lp, therefore, 

Vp(Xn,a) = Vp(X A , A ) = ^ _ L q - (65) 
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Case 2: If An,a < t, we have Vg(A n,a) = ’('(s.hpIAn.a)- Therefore, 


V/3(Xn,a ) = (1 - An,a)[C r/jv + Vg 1 (An,a)] + An,a[Cna + Vg(A a,a)]- 
Vs(Aa.a) = nrin V3 ,a 4 (Aa,a) 

A t e { HP , LP } 

= min{C'r, + F^(T(Aa,a)), Vhp(Aa,a)} 

= ■»i»{a 1 T^> 0 £ S-, { c^ + vs HP (r»(^))}}. 

Since IV —» oo and 0 < f3 < 1, 

Wa,a) = min{C' L ^—^ + /3"Va hp (T"(Aa,a))}- 

n>0 I — p 

we have: 

, N C L ^+/3"[T n (AA,A)(C' ffiV + C(A i v i A))+T"(AA,A)C ff A] 1 

Ts(Aa,a) = mini-=- t- 3 -} 

n >0 1 _ pn+l [T n {XA A) + T"(Aa,a)] 

where 

rpn f \ \ r r( r r n ~^(\ 'A (Aa,A An,a) (1 Aa,a) + An,A 

f (Aa,aJ = I (J (AA,a)) = - 1 -77-7-7-> 

I - (Aa,A — an,A) 


( 66 ) 

(67) 

( 68 ) 

(69) 

(70) 

(71) 


(72) 


T u {\a,a) = 1 — T n (X A , A ) 


(73) 


C(Xn,a) = P 


(1 — Xn,a)Chn + A n,aCha 
1 - (1 - An,a)/3 


(74) 


Appendix B 

Proof of CorollaryQ] 
Proof: By setting VLp(pt) > Vnp(pt), we have 


PCl + pV$(T( Pt )) > 

(l-pt)[P t CH N + 0V$(X N , A )]+Pt[p t C H A+PV${XA,A)]. 

By Lemma |2] in the appendix, Vp(pt) is a concave function. Thus, 


(75) 


Vp(T(pt)) — T^(An,a(1 — Vt) + A A ,APt) 
A (1 — Pt)Vp{ An,a) + PtVp{X A ,A)- 


(76) 


By substituting 76 into 

Cha-Chn = K Whetl ^Lpfe) > V3 P (pt). 


75 we can simplify inequality |75] to (1 —pt)CnN + PtCHA < Cl, which implies p t < 
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Appendix C 

Proof of Corollary[2] 

Proof: Assume that Xn,a > t, we have > pp = a ) > > r. In this case. By |6 T]i and 

©, we have 


r _ Cz, — Chn _ ^ 

Cha—Chn 

Thus, t = if Xn,a > k. Assume that A n,a < t, then there are two cases for pp: 
Case 1 : pp > t, then A a,A > Pf > t, which implies 

C L 


(77) 


Vp{\A, a) = V^ i lp(A A,a) = 


1 -, 


By ( |6~i~| i, ( [66l >, and ( [78] ), we have 


/3(Cl — Cha)Xn,a + Cl — Chn 
(1-/3)Cha-Chn + PC l ' 


(78) 


(79) 


Therefore, r = if W = T= 

/3(Cl-Cha)^n,a-\-Cl—C hn 
( 1 — @)Ch a—Chn+PCl 


1— (Aa,a— ^n,a) — 


\ _ /3(Cl — Cha)^n,a j tCl—Chn \ ^ 

> r = (i ^„._ Chn+/3Cl and X NtA < 


(1 -P)Cha- 


Case 2: pp < t, t can be computed by (|64[, (|66|), and (|7 1 [i. Moreover, for fixed A a, a, ([64]) is a non-decreasing 


(3(Cl — Cha)^n,a+Cl — Chn 
(1 — (3)Cha~ Chn+(3Cl ’ 


function w.r.t. A n A- Thus, let r + = — ,, AjV|A , - Y 

’ 1 — (Aj.a- an,a) 

t + is an upperbound for the optimal action in Case 2. 

Since ( |64[ ) is non-decreasing, ( |79| ) is decreasing and intersects with ( [77] ) at A n,a = 
Corollary [2] 


< t + in Case 2. Therefore, 


Cl~Chn 

Cha~Chn 


, we have proved 


Appendix D 
Proof of Theorem[2] 


Proof: Let pp = i-(a A a’-Ajv t) anc ^ ^'f = i-(\' "’-a' — f t> e stationary belief of a consumer being in 
alerted state when the transition matrix is Alp and Ahp- Since r be the threshold of offering either HP or LP 
coupons. Then we have Vg LP (r) = Vg HP (r). This implies: 


^ 3 ,l_p( r ) — ^/ 3 ,Hp( t ) 

j t 


(80) 


= /3‘(Cl - (1 - t)C H n - tC ha ) + [V' /3 t+1 (T(r)) - V^\T\r))}. 


In order to compute the threshold r, we need to divide the computation into four cases with respect to to T(r) and 

T'(r). 

Case 1: T(r ) > r and T'(t) > r. Thus 


y; + \T{r)) = V;+} p (T{t)) = 


(81) 
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(82) 


V; + \T'{t)) = V;f P (T\r)) = (3 t+1 


C L 

1-/3 


)• 


By setting Vg lp (t) — Vg hp (t) = 0, we have r 
Case 2: T(r) < r and T'(r) > r. 


Cl—Chn 
Cha—Chn ’ 


Since T(r) < r, HP coupons will be offered from timeslot t + 1. Define rj 


P(^A,A — ^N,a) 
1 — /3(A aa —A^ a ) 


. Thus, 


(3(Xa,a~Xn,a) 
1—(3(Xa,a—Xn,a) 


and rf — 


V; + \T(r)) = V^ p (T(r)) (83) 

OO 

= /3 t — C^jv)(pf(1 — (Aa,a — ^n,aY) + (Aa,a — A/v,a)V) + (Chn)]} (84) 

i=l 

OO OO 

= ^{^2 WCha ~ C H n){vf + C H n )] + ^2 P 1 [(Cha ~ C H n)(t ~ Pf)(^i - Ajv,a)*]} (85) 

»=1 i=X 

= /?*{ ^ p [(Cha — Chn){pf ) + (Chn)]v(Cha — Chn)(t — pf)} (86) 

= P*{pf(Cha — Chn){ j ^ p - v)y^^(.Chn) + v{Cha — Chn)t}. (87) 

Because T'(t) > r, only LP coupons will be offered after time t. 

V; + \T\t)) = V;2 p {T'(t)) = (88) 

By setting Vg LP (r) — Vg hp( t ) = 0> we have then that r is equal to 

(Cl — Chn) + Pf(Cha — Chn)[jz^ ~ v\ + jzt^(Chn — Cl) 

(Cha - Chn)[ 1 — ??] 

Case 3: T(r) < r and T'(t) < r. In this case. 


V; + \T(r)) = V^ p (T(t)). 


(90) 




Setting Vg LP (r) — Vg HP (r) = 0, we can find the threshold r by equation 

Cl - Chn + (Cha — Chn)[j^p(pf - p'f ) — ( PfV — p' F rj ')] 
(Cha - Chn)W — V + 1 ] 

Case 4: T(t ) > r and T'(r) < r. In this case. 


V; +1 (T(t)) = V^(T(t)). 


(91) 


(92) 


(93) 


V; +1 (T'(r )) = V^ p (T'(t)). 


(94) 
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By setting Vpj Lp (r) — Vpj HP (r) = 0, we have r equals to equation 

(C L — C H n)(1 + yzj) — p'f(Cha - C H n)W ~ jfp] 
( Cra — Chn)[ 1 + V 1 } 


( 95 ) 
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